The regular expression of R language is mainly used to deal with textual data, such as searching, replacing and so on.The first is some of the functions that will be used when working with text:String split: Strsplit ()String connection: Paste (), Paste0 ()Calculate string Length: nchar (), Length ()String intercept: substr (), substring ()See if there is a character: grep (P,X) GREPL (P,X) Give me a chestnut: s " 123abc\\456 " , " abc123edf " Span
, author and other relations6 #Article release time7Sample (R_blog[grep ("Posted:", R_blog)],10)8 #article title9Sample (R_blog[grep ("Posted:", R_blog) -2],10)Ten #article author OneSample (R_blog[grep ("Posted:", R_blog) +3],5) A - #3, according to the above information, according to the article split long text (4.5M) - the #4, the library as a condition to see what the recent popular libraries -Library_list"^library\\ (", R_blog)],"\\(|\\)|\\,") -Library_list]) -Library_list"\\p{p}","", lib
the data preprocessing phase you need to skillfully manipulate the string object. Of course, if you are good at other processing software, such as Python, you can make it responsible for the dirty work of the previous period.String intercept: substr () can take a subset of a given string object, whose parameters are the starting and ending positions of the subset.String substitution: gsub () is responsible for searching the specific expression of the string and substituting the new content. The
-A3-A4-A5-A6"Substr
String truncation Function
substr(x = "hello", start = 1, stop = 2)[1] "he"Strsplit
String delimiter to generate a list
strsplit("abc", split = "")[[1]][1] "a" "b" "c"
If you want to use this function for a vector, pay attention to it.
# Divide each element of a vector and obtain the unlist (lapply (X = c ("abc", "bcd", "dfafadf") of the first element after the split "), FUN = function (x) {return (strsplit (x, split = "") [[1] [1])}) [1] "a" "B" "d"Gsub and sub
String replac
a vector, you need to be aware of it.# 分割向量的每一个元素,并取分割后的第一个元素unlist(lapply(X = c("abc""bcd""dfafadf"function(x) {return"")[[1]][1])}))[1"a""b""d"Gsub and SubString substitutiongsub Replace all matches toThe sub replaces the first one that matches the# Replace B with BGsub (pattern ="B", replacement ="B", x ="Baby")[1]"BaBy"Gsub (pattern ="B", replacement ="B", x = C ("ABCB","Boy.","Baby"))[1]"ABCB" "Boy." "BaBy"# Replace only the first BSub (pattern ="B", replacement ="B", x ="Baby")[1]"Baby"S
braces {} are used together with "|"Note:Escape Character \ is required to escape all reserved charactersFor example:Meanings of Common special escape characters \ N: linefeed\ T: tab\ W: Any letter (including underscores) or number is [a-zA-Z0-9 _]\ W: \ w antsense is [^ a-zA-Z0-9 _]\ D: Any number [0-9]\ D: \ d's antsense is [^ 0-9]\ S: any space, such as space, tab, newline, etc.\ S: \ s. Any non-spaceCommon Regular Expression Functions
Grepl: ret
1. re.sub? Signature:re.sub (Pattern, REPL, String, Count=0, flags=0) Docstring:return the string obtained by replacing the L Eftmostnon in string by Thereplacement repl. or a callable; if in it is processed. is a callable, it ' s passed the match object and must returna replacement string to be used.Parameter description: Pattern string, can be numerically named can also name name (\gRepl replaced string can also be a function string source str
uppercase characters)–[A-ZA-Z]: any one English letter–[a-z]+: One or more lowercase English letters| OrParentheses () with curly braces {} with "|" UseSpecial Note: reserved characters require the escape character \ to escape the representationFor example:common meaning of special escape characters? \ n: Line break? \t:tab? \w: Any letter (including underscores) or numbers [a-za-z0-9_]? \w:\w's antisense meaning is [^a-za-z0-9_]? \d: Any number that is [0-9]? \d:\d's antisense meaning is [^0-9
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.